在Proxmox VE 7.1 中开启vGPU 您所在的位置:网站首页 1060 黑屏 在Proxmox VE 7.1 中开启vGPU

在Proxmox VE 7.1 中开启vGPU

2023-05-05 11:46| 来源: 网络整理| 查看: 265

一:了解NVIDIA vGPU 下图是Nvidia vGPU的原理。在宿主机上安装vGPU驱动,使用nvidia vGPU管理器控制vGPU,随后创建多个mdev设备,也就是vGPU,用于直通到虚拟机,虚拟机使用Nvidia 驱动用于驱动vGPU。有点类似gvt-g。不过这里最重要的是NVIDIA vGPU管理器 。

在宿主机上安装好nvidia vgpu驱动之后,会有2个服务。

nvidia-vgpud.service nvidia-vgpu-mgr.service 简单的解释下这2个服务在vgpu启动时的作用:

1、在使用vGPU卡的时候,正常逻辑是,开机之后,nvidia-vgpud 服务会查询内核中所有已安装的 GPU,并检查 vGPU 功能。如果找到支持 vGPU 的 GPU,则 nvidia-vgpu 会创建一个 MDEV 设备,系统会创建 /sys/class/mdev_bus 目录。

2、将这些设备分配给 VM,当 VM 启动时,它将打开 MDEV 设备。nvidia-vgpu-mgr 此时会使用 ioctl 与内核进行通信。当 nvidia-vgpu-mgr 询问 GPU 是否支持 vGPU 时,vgpu会回答是,随后尝试初始化 vGPU 设备。

目前vgpu_unlock项目只支持Time-sliced技术,也就是单GPU实例性能会动态分配。如一张P4,如果只有一个GPU实例,那么多获得接近100%的性能,同时2个GPU实例,会分别获得1/2的性能。

根据Nvidia vgpu限制,单GPU实例,最少1g显存。如P4 8G,最多有8个1G 显存的GPU实例同时运行

二:了解vgpu_unlock原理 正如我们上说vgpu的启动流程。当然我们使用消费卡的时候,nvidia-vgpud这个服务会检测卡的类型,如果是消费卡,自然不会创建mdev设备。如果使用vgpu_unlock,此脚本会拦截nvidia-vgpud的调用,然后欺骗它,这是一张vGPU卡,快产生mdev设备信息吧!

将mdev设备直通给虚拟机,启动的时候,vgpu_unlock又会拦截nvdia-vgpu-mgr服务,告诉它,GPU支持vGPU,快初始化设备吧!

三:vGPU_unlock支持的显卡 点击查看显卡列表 [21c4] TU116 [GeForce GTX 1660 SUPER] -> Quadro RTX 6000

[21d1] TU116BM [GeForce GTX 1660 Ti Mobile] -> Quadro RTX 6000

[21c2] TU116 -> Quadro RTX 6000

[2182] TU116 [GeForce GTX 1660 Ti] -> Quadro RTX 6000

[2183] TU116 -> Quadro RTX 6000

[2184] TU116 [GeForce GTX 1660] -> Quadro RTX 6000

[2187] TU116 [GeForce GTX 1650 SUPER] -> Quadro RTX 6000

[2188] TU116 [GeForce GTX 1650] -> Quadro RTX 6000

[2191] TU116M [GeForce GTX 1660 Ti Mobile] -> Quadro RTX 6000

[2192] TU116M [GeForce GTX 1650 Ti Mobile] -> Quadro RTX 6000

[21ae] TU116GL -> Quadro RTX 6000

[21bf] TU116GL -> Quadro RTX 6000

[2189] TU116 [CMP 30HX] -> Quadro RTX 6000

[1fbf] TU117GL -> Quadro RTX 6000

[1fbb] TU117GLM [Quadro T500 Mobile] -> Quadro RTX 6000

[1fd9] TU117BM [GeForce GTX 1650 Mobile Refresh] -> Quadro RTX 6000

[1ff9] TU117GLM [Quadro T1000 Mobile] -> Quadro RTX 6000

[1fdd] TU117BM [GeForce GTX 1650 Mobile Refresh] -> Quadro RTX 6000

[1f96] TU117M [GeForce GTX 1650 Mobile / Max-Q] -> Quadro RTX 6000

[1f99] TU117M -> Quadro RTX 6000

[1fae] TU117GL -> Quadro RTX 6000

[1fb8] TU117GLM [Quadro T2000 Mobile / Max-Q] -> Quadro RTX 6000

[1fb9] TU117GLM [Quadro T1000 Mobile] -> Quadro RTX 6000

[1f97] TU117M [GeForce MX450] -> Quadro RTX 6000

[1f98] TU117M [GeForce MX450] -> Quadro RTX 6000

[1f9c] TU117M [GeForce MX450] -> Quadro RTX 6000

[1f9d] TU117M [GeForce GTX 1650 Mobile / Max-Q] -> Quadro RTX 6000

[1fb0] TU117GLM [Quadro T1000 Mobile] -> Quadro RTX 6000

[1fb1] TU117GL [T600] -> Quadro RTX 6000

[1fb2] TU117GLM [Quadro T400 Mobile] -> Quadro RTX 6000

[1fba] TU117GLM [T600 Mobile] -> Quadro RTX 6000

[1f42] TU106 [GeForce RTX 2060 SUPER] -> Quadro RTX 6000

[1f47] TU106 [GeForce RTX 2060 SUPER] -> Quadro RTX 6000

[1f50] TU106BM [GeForce RTX 2070 Mobile / Max-Q] -> Quadro RTX 6000

[1f51] TU106BM [GeForce RTX 2060 Mobile] -> Quadro RTX 6000

[1f54] TU106BM [GeForce RTX 2070 Mobile] -> Quadro RTX 6000

[1f55] TU106BM [GeForce RTX 2060 Mobile] -> Quadro RTX 6000

[1f81] TU117 -> Quadro RTX 6000

[1f82] TU117 [GeForce GTX 1650] -> Quadro RTX 6000

[1f91] TU117M [GeForce GTX 1650 Mobile / Max-Q] -> Quadro RTX 6000

[1f92] TU117M [GeForce GTX 1650 Mobile] -> Quadro RTX 6000

[1f94] TU117M [GeForce GTX 1650 Mobile] -> Quadro RTX 6000

[1f95] TU117M [GeForce GTX 1650 Ti Mobile] -> Quadro RTX 6000

[1f76] TU106GLM [Quadro RTX 3000 Mobile Refresh] -> Quadro RTX 6000

[1f07] TU106 [GeForce RTX 2070 Rev. A] -> Quadro RTX 6000

[1f08] TU106 [GeForce RTX 2060 Rev. A] -> Quadro RTX 6000

[1f09] TU106 [GeForce GTX 1660 SUPER] -> Quadro RTX 6000

[1f0a] TU106 [GeForce GTX 1650] -> Quadro RTX 6000

[1f10] TU106M [GeForce RTX 2070 Mobile] -> Quadro RTX 6000

[1f11] TU106M [GeForce RTX 2060 Mobile] -> Quadro RTX 6000

[1f12] TU106M [GeForce RTX 2060 Max-Q] -> Quadro RTX 6000

[1f14] TU106M [GeForce RTX 2070 Mobile / Max-Q Refresh] -> Quadro RTX 6000

[1f15] TU106M [GeForce RTX 2060 Mobile] -> Quadro RTX 6000

[1f2e] TU106M -> Quadro RTX 6000

[1f36] TU106GLM [Quadro RTX 3000 Mobile / Max-Q] -> Quadro RTX 6000

[1f0b] TU106 [CMP 40HX] -> Quadro RTX 6000

[1eb5] TU104GLM [Quadro RTX 5000 Mobile / Max-Q] -> Quadro RTX 6000

[1eb6] TU104GLM [Quadro RTX 4000 Mobile / Max-Q] -> Quadro RTX 6000

[1eb8] TU104GL [Tesla T4] -> Quadro RTX 6000

[1eb9] TU104GL -> Quadro RTX 6000

[1ebe] TU104GL -> Quadro RTX 6000

[1ec2] TU104 [GeForce RTX 2070 SUPER] -> Quadro RTX 6000

[1ec7] TU104 [GeForce RTX 2070 SUPER] -> Quadro RTX 6000

[1ed0] TU104BM [GeForce RTX 2080 Mobile] -> Quadro RTX 6000

[1ed1] TU104BM [GeForce RTX 2070 SUPER Mobile / Max-Q] -> Quadro RTX 6000

[1ed3] TU104BM [GeForce RTX 2080 SUPER Mobile / Max-Q] -> Quadro RTX 6000

[1f02] TU106 [GeForce RTX 2070] -> Quadro RTX 6000

[1f04] TU106 -> Quadro RTX 6000

[1f06] TU106 [GeForce RTX 2060 SUPER] -> Quadro RTX 6000

[1ef5] TU104GLM [Quadro RTX 5000 Mobile Refresh] -> Quadro RTX 6000

[1e81] TU104 [GeForce RTX 2080 SUPER] -> Quadro RTX 6000

[1e82] TU104 [GeForce RTX 2080] -> Quadro RTX 6000

[1e84] TU104 [GeForce RTX 2070 SUPER] -> Quadro RTX 6000

[1e87] TU104 [GeForce RTX 2080 Rev. A] -> Quadro RTX 6000

[1e89] TU104 [GeForce RTX 2060] -> Quadro RTX 6000

[1e90] TU104M [GeForce RTX 2080 Mobile] -> Quadro RTX 6000

[1e91] TU104M [GeForce RTX 2070 SUPER Mobile / Max-Q] -> Quadro RTX 6000

[1e93] TU104M [GeForce RTX 2080 SUPER Mobile / Max-Q] -> Quadro RTX 6000

[1eab] TU104M -> Quadro RTX 6000

[1eae] TU104M -> Quadro RTX 6000

[1eb0] TU104GL [Quadro RTX 5000] -> Quadro RTX 6000

[1eb1] TU104GL [Quadro RTX 4000] -> Quadro RTX 6000

[1eb4] TU104GL [T4G] -> Quadro RTX 6000

[1e04] TU102 [GeForce RTX 2080 Ti] -> Quadro RTX 6000

[1e07] TU102 [GeForce RTX 2080 Ti Rev. A] -> Quadro RTX 6000

[1e2d] TU102 [GeForce RTX 2080 Ti Engineering Sample] -> Quadro RTX 6000

[1e2e] TU102 [GeForce RTX 2080 Ti 12GB Engineering Sample] -> Quadro RTX 6000

[1e30] TU102GL [Quadro RTX 6000/8000] -> Quadro RTX 6000

[1e36] TU102GL [Quadro RTX 6000] -> Quadro RTX 6000

[1e37] TU102GL [GRID RTX T10-4/T10-8/T10-16] -> Quadro RTX 6000

[1e38] TU102GL -> Quadro RTX 6000

[1e3c] TU102GL -> Quadro RTX 6000

[1e3d] TU102GL -> Quadro RTX 6000

[1e3e] TU102GL -> Quadro RTX 6000

[1e78] TU102GL [Quadro RTX 6000/8000] -> Quadro RTX 6000

[1e09] TU102 [CMP 50HX] -> Quadro RTX 6000

[1dba] GV100GL [Quadro GV100] -> Tesla V100 32GB PCIE

[1e02] TU102 [TITAN RTX] -> Quadro RTX 6000

[1cfa] GP107GL [Quadro P2000] -> Tesla P40

[1cfb] GP107GL [Quadro P1000] -> Tesla P40

[1d01] GP108 [GeForce GT 1030] -> Tesla P40

[1d10] GP108M [GeForce MX150] -> Tesla P40

[1d11] GP108M [GeForce MX230] -> Tesla P40

[1d12] GP108M [GeForce MX150] -> Tesla P40

[1d13] GP108M [GeForce MX250] -> Tesla P40

[1d16] GP108M [GeForce MX330] -> Tesla P40

[1d33] GP108GLM [Quadro P500 Mobile] -> Tesla P40

[1d34] GP108GLM [Quadro P520] -> Tesla P40

[1d52] GP108BM [GeForce MX250] -> Tesla P40

[1d56] GP108BM [GeForce MX330] -> Tesla P40

[1d81] GV100 [TITAN V] -> Tesla V100 32GB PCIE

[1cb6] GP107GL [Quadro P620] -> Tesla P40

[1cba] GP107GLM [Quadro P2000 Mobile] -> Tesla P40

[1cbb] GP107GLM [Quadro P1000 Mobile] -> Tesla P40

[1cbc] GP107GLM [Quadro P600 Mobile] -> Tesla P40

[1cbd] GP107GLM [Quadro P620] -> Tesla P40

[1ccc] GP107BM [GeForce GTX 1050 Ti Mobile] -> Tesla P40

[1ccd] GP107BM [GeForce GTX 1050 Mobile] -> Tesla P40

[1ca8] GP107GL -> Tesla P40

[1caa] GP107GL -> Tesla P40

[1cb1] GP107GL [Quadro P1000] -> Tesla P40

[1cb2] GP107GL [Quadro P600] -> Tesla P40

[1cb3] GP107GL [Quadro P400] -> Tesla P40

[1c70] GP106GL -> Tesla P40

[1c81] GP107 [GeForce GTX 1050] -> Tesla P40

[1c82] GP107 [GeForce GTX 1050 Ti] -> Tesla P40

[1c83] GP107 [GeForce GTX 1050 3GB] -> Tesla P40

[1c8c] GP107M [GeForce GTX 1050 Ti Mobile] -> Tesla P40

[1c8d] GP107M [GeForce GTX 1050 Mobile] -> Tesla P40

[1c8e] GP107M -> Tesla P40

[1c8f] GP107M [GeForce GTX 1050 Ti Max-Q] -> Tesla P40

[1c90] GP107M [GeForce MX150] -> Tesla P40

[1c91] GP107M [GeForce GTX 1050 3 GB Max-Q] -> Tesla P40

[1c92] GP107M [GeForce GTX 1050 Mobile] -> Tesla P40

[1c94] GP107M [GeForce MX350] -> Tesla P40

[1c96] GP107M [GeForce MX350] -> Tesla P40

[1ca7] GP107GL -> Tesla P40

[1c36] GP106 [P106M] -> Tesla P40

[1c07] GP106 [P106-100] -> Tesla P40

[1c09] GP106 [P106-090] -> Tesla P40

[1c20] GP106M [GeForce GTX 1060 Mobile] -> Tesla P40

[1c21] GP106M [GeForce GTX 1050 Ti Mobile] -> Tesla P40

[1c22] GP106M [GeForce GTX 1050 Mobile] -> Tesla P40

[1c23] GP106M [GeForce GTX 1060 Mobile Rev. 2] -> Tesla P40

[1c2d] GP106M -> Tesla P40

[1c30] GP106GL [Quadro P2000] -> Tesla P40

[1c31] GP106GL [Quadro P2200] -> Tesla P40

[1c35] GP106M [Quadro P2000 Mobile] -> Tesla P40

[1c60] GP106BM [GeForce GTX 1060 Mobile 6GB] -> Tesla P40

[1c61] GP106BM [GeForce GTX 1050 Ti Mobile] -> Tesla P40

[1c62] GP106BM [GeForce GTX 1050 Mobile] -> Tesla P40

[1bb8] GP104GLM [Quadro P3000 Mobile] -> Tesla P40

[1bb9] GP104GLM [Quadro P4200 Mobile] -> Tesla P40

[1bbb] GP104GLM [Quadro P3200 Mobile] -> Tesla P40

[1bc7] GP104 [P104-101] -> Tesla P40

[1be0] GP104BM [GeForce GTX 1080 Mobile] -> Tesla P40

[1be1] GP104BM [GeForce GTX 1070 Mobile] -> Tesla P40

[1c00] GP106 -> Tesla P40

[1c01] GP106 -> Tesla P40

[1c02] GP106 [GeForce GTX 1060 3GB] -> Tesla P40

[1c03] GP106 [GeForce GTX 1060 6GB] -> Tesla P40

[1c04] GP106 [GeForce GTX 1060 5GB] -> Tesla P40

[1c06] GP106 [GeForce GTX 1060 6GB Rev. 2] -> Tesla P40

[1b87] GP104 [P104-100] -> Tesla P40

[1ba0] GP104M [GeForce GTX 1080 Mobile] -> Tesla P40

[1ba1] GP104M [GeForce GTX 1070 Mobile] -> Tesla P40

[1ba2] GP104M [GeForce GTX 1070 Mobile] -> Tesla P40

[1ba9] GP104M -> Tesla P40

[1baa] GP104M -> Tesla P40

[1bad] GP104 [GeForce GTX 1070 Engineering Sample] -> Tesla P40

[1bb0] GP104GL [Quadro P5000] -> Tesla P40

[1bb1] GP104GL [Quadro P4000] -> Tesla P40

[1bb3] GP104GL [Tesla P4] -> Tesla P40

[1bb4] GP104GL [Tesla P6] -> Tesla P40

[1bb5] GP104GLM [Quadro P5200 Mobile] -> Tesla P40

[1bb6] GP104GLM [Quadro P5000 Mobile] -> Tesla P40

[1bb7] GP104GLM [Quadro P4000 Mobile] -> Tesla P40

[1b06] GP102 [GeForce GTX 1080 Ti] -> Tesla P40

[1b07] GP102 [P102-100] -> Tesla P40

[1b30] GP102GL [Quadro P6000] -> Tesla P40

[1b38] GP102GL [Tesla P40] -> Tesla P40

[1b70] GP102GL -> Tesla P40

[1b78] GP102GL -> Tesla P40

[1b80] GP104 [GeForce GTX 1080] -> Tesla P40

[1b81] GP104 [GeForce GTX 1070] -> Tesla P40

[1b82] GP104 [GeForce GTX 1070 Ti] -> Tesla P40

[1b83] GP104 [GeForce GTX 1060 6GB] -> Tesla P40

[1b84] GP104 [GeForce GTX 1060 3GB] -> Tesla P40

[1b39] GP102GL [Tesla P10] -> Tesla P40

[1b00] GP102 [TITAN X] -> Tesla P40

[1b01] GP102 [GeForce GTX 1080 Ti 10GB] -> Tesla P40

[1b02] GP102 [TITAN Xp] -> Tesla P40

[1b04] GP102 -> Tesla P40

[179c] GM107 [GeForce 940MX] -> Tesla M10

[17c2] GM200 [GeForce GTX TITAN X] -> Tesla M60

[17c8] GM200 [GeForce GTX 980 Ti] -> Tesla M60

[17f0] GM200GL [Quadro M6000] -> Tesla M60

[17f1] GM200GL [Quadro M6000 24GB] -> Tesla M60

[17fd] GM200GL [Tesla M40] -> Tesla M60

[1617] GM204M [GeForce GTX 980M] -> Tesla M60

[1618] GM204M [GeForce GTX 970M] -> Tesla M60

[1619] GM204M [GeForce GTX 965M] -> Tesla M60

[161a] GM204M [GeForce GTX 980 Mobile] -> Tesla M60

[1667] GM204M [GeForce GTX 965M] -> Tesla M60

[1725] GP100 -> Tesla P40

[172e] GP100 -> Tesla P40

[172f] GP100 -> Tesla P40

[174d] GM108M [GeForce MX130] -> Tesla M10

[174e] GM108M [GeForce MX110] -> Tesla M10

[1789] GM107GL [GRID M3-3020] -> Tesla M10

[1402] GM206 [GeForce GTX 950] -> Tesla M60

[1406] GM206 [GeForce GTX 960 OEM] -> Tesla M60

[1407] GM206 [GeForce GTX 750 v2] -> Tesla M60

[1427] GM206M [GeForce GTX 965M] -> Tesla M60

[1430] GM206GL [Quadro M2000] -> Tesla M60

[1431] GM206GL [Tesla M4] -> Tesla M60

[1436] GM206GLM [Quadro M2200 Mobile] -> Tesla M60

[15f0] GP100GL [Quadro GP100] -> Tesla P40

[15f1] GP100GL -> Tesla P40

[1404] GM206 [GeForce GTX 960 FAKE] -> Tesla M60

[13d8] GM204M [GeForce GTX 970M] -> Tesla M60

[13d9] GM204M [GeForce GTX 965M] -> Tesla M60

[13da] GM204M [GeForce GTX 980 Mobile] -> Tesla M60

[13e7] GM204GL [GeForce GTX 980 Engineering Sample] -> Tesla M60

[13f0] GM204GL [Quadro M5000] -> Tesla M60

[13f1] GM204GL [Quadro M4000] -> Tesla M60

[13f2] GM204GL [Tesla M60] -> Tesla M60

[13f3] GM204GL [Tesla M6] -> Tesla M60

[13f8] GM204GLM [Quadro M5000M / M5000 SE] -> Tesla M60

[13f9] GM204GLM [Quadro M4000M] -> Tesla M60

[13fa] GM204GLM [Quadro M3000M] -> Tesla M60

[13fb] GM204GLM [Quadro M5500] -> Tesla M60

[1401] GM206 [GeForce GTX 960] -> Tesla M60

[13b3] GM107GLM [Quadro K2200M] -> Tesla M10

[13b4] GM107GLM [Quadro M620 Mobile] -> Tesla M10

[13b6] GM107GLM [Quadro M1200 Mobile] -> Tesla M10

[13b9] GM107GL [NVS 810] -> Tesla M10

[13ba] GM107GL [Quadro K2200] -> Tesla M10

[13bb] GM107GL [Quadro K620] -> Tesla M10

[13bc] GM107GL [Quadro K1200] -> Tesla M10

[13bd] GM107GL [Tesla M10] -> Tesla M10

[13c0] GM204 [GeForce GTX 980] -> Tesla M60

[13c1] GM204 -> Tesla M60

[13c2] GM204 [GeForce GTX 970] -> Tesla M60

[13c3] GM204 -> Tesla M60

[13d7] GM204M [GeForce GTX 980M] -> Tesla M60

[1389] GM107GL [GRID M30] -> Tesla M10

[1390] GM107M [GeForce 845M] -> Tesla M10

[1391] GM107M [GeForce GTX 850M] -> Tesla M10

[1392] GM107M [GeForce GTX 860M] -> Tesla M10

[1393] GM107M [GeForce 840M] -> Tesla M10

[1398] GM107M [GeForce 845M] -> Tesla M10

[1399] GM107M [GeForce 945M] -> Tesla M10

[139a] GM107M [GeForce GTX 950M] -> Tesla M10

[139b] GM107M [GeForce GTX 960M] -> Tesla M10

[139c] GM107M [GeForce 940M] -> Tesla M10

[139d] GM107M [GeForce GTX 750 Ti] -> Tesla M10

[13b0] GM107GLM [Quadro M2000M] -> Tesla M10

[13b1] GM107GLM [Quadro M1000M] -> Tesla M10

[13b2] GM107GLM [Quadro M600M] -> Tesla M10

[1347] GM108M [GeForce 940M] -> Tesla M10

[1348] GM108M [GeForce 945M / 945A] -> Tesla M10

[1349] GM108M [GeForce 930M] -> Tesla M10

[134b] GM108M [GeForce 940MX] -> Tesla M10

[134d] GM108M [GeForce 940MX] -> Tesla M10

[134e] GM108M [GeForce 930MX] -> Tesla M10

[134f] GM108M [GeForce 920MX] -> Tesla M10

[137a] GM108GLM [Quadro K620M / Quadro M500M] -> Tesla M10

[137b] GM108GLM [Quadro M520 Mobile] -> Tesla M10

[137d] GM108M [GeForce 940A] -> Tesla M10

[1380] GM107 [GeForce GTX 750 Ti] -> Tesla M10

[1381] GM107 [GeForce GTX 750] -> Tesla M10

[1382] GM107 [GeForce GTX 745] -> Tesla M10

[1340] GM108M [GeForce 830M] -> Tesla M10

[1341] GM108M [GeForce 840M] -> Tesla M10

[1344] GM108M [GeForce 845M] -> Tesla M10

[1346] GM108M [GeForce 930M] -> Tesla M10

四:准备环境 4.1: 配置软件源

rm /etc/apt/sources.list rm /etc/apt/sources.list.d/* echo "deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bullseye main contrib non-free">>/etc/apt/sources.list echo "deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bullseye-updates main contrib non-free">>/etc/apt/sources.list echo "deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bullseye-backports main contrib non-free">>/etc/apt/sources.list echo "deb https://mirrors.tuna.tsinghua.edu.cn/debian-security bullseye-security main contrib non-free">>/etc/apt/sources.list echo "deb https://mirrors.tuna.tsinghua.edu.cn/proxmox/debian bullseye pve-no-subscription">>/etc/apt/sources.list

4.2 安装必要的软件包

apt update && apt install dkms git build-essential pve-kernel-5.15 pve-headers-5.15 dkms cargo jq uuid-runtime -y

安装mdevctl

wget -P /opt/ http://ftp.br.debian.org/debian/pool/main/m/mdevctl/mdevctl_0.81-1_all.deb dpkg -i /opt/mdevctl_0.81-1_all.deb

4.3 配置内核

echo vfio >> /etc/modules echo vfio_iommu_type1 >> /etc/modules echo vfio_pci >> /etc/modules echo vfio_virqfd >> /etc/modules echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf echo "options kvm ignore_msrs=1" > /etc/modprobe.d/kvm.conf update-initramfs -k all -u

4.4 配置引导

编辑grub,请不要盲目改。根据自己的环境,选择设置 vi /etc/default/grub #在里面找到: GRUB_CMDLINE_LINUX_DEFAULT="quiet" #然后修改为: GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on" #如果是amd cpu请改为: GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on" #更新引导 update-grub

4.5 安装驱动 重启主机,待重启之后,验证系统内核是否在5.15

root@pve:~# uname -r 5.15.30-2-pve

如出现5.15则说明正确。

验证是否开启iommu

出现有如下iommu group说明成功 root@pve3:~# dmesg |grep iommu [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.11.22-7-pve root=/dev/mapper/pve-root ro quiet iommu=pt intel_iommu=on [ 0.075784] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.11.22-7-pve root=/dev/mapper/pve-root ro quiet iommu=pt intel_iommu=on [ 0.352588] iommu: Default domain type: Passthrough (set via kernel command line) [ 1.373583] pci 0000:00:00.0: Adding to iommu group 0 [ 1.373592] pci 0000:00:02.0: Adding to iommu group 1 [ 1.373605] pci 0000:00:14.0: Adding to iommu group 2 [ 1.373613] pci 0000:00:17.0: Adding to iommu group 3 [ 1.373623] pci 0000:00:1c.0: Adding to iommu group 4 [ 1.373637] pci 0000:00:1d.0: Adding to iommu group 5 [ 1.373647] pci 0000:00:1d.2: Adding to iommu group 6 [ 1.373656] pci 0000:00:1d.3: Adding to iommu group 7 [ 1.373675] pci 0000:00:1f.0: Adding to iommu group 8 [ 1.373683] pci 0000:00:1f.2: Adding to iommu group 8 [ 1.373691] pci 0000:00:1f.3: Adding to iommu group 8 [ 1.373699] pci 0000:00:1f.4: Adding to iommu group 8 [ 1.373707] pci 0000:00:1f.6: Adding to iommu group 9 [ 1.373717] pci 0000:01:00.0: Adding to iommu group 10 [ 1.373726] pci 0000:03:00.0: Adding to iommu group 11 [ 1.373735] pci 0000:05:00.0: Adding to iommu group 12 [ 1.656483] intel_iommu=on

验证nouveau是否未启用

无输出,代表未启用 root@pve3:~# lsmod|grep nouveau root@pve3:~#

下载驱动

#将驱动下载至/opt目录 wget https://foxi.buduanwang.vip/pan/foxi/Virtualization/vGPU/NVIDIA-Linux-x86_64-460.73.01-grid-vgpu-kvm-v5-5.15.run -P /opt

给驱动添加可执行权限

chmod +x /opt/NVIDIA-Linux-x86_64-460.73.01-grid-vgpu-kvm-v5-5.15.run

以dkms方式安装驱动

sh -c /opt/NVIDIA-Linux-x86_64-460.73.01-grid-vgpu-kvm-v5-5.15.run

运行命令后,会提示是否用dkms方式安装,选择yes,回车继续

image.png

出现xorg告警,忽略

image.png

询问是否启用32位兼容库。这里可选可不选

image.png

开始安装驱动

image.png

进度条走完就ok,可能会有点时间。

image.png

五:配置vgpu_unlock 5.1 编译 下载vgpu_unlock-rs版本

cd /opt/ && git clone https://github.com/mbilker/vgpu_unlock-rs.git

使用cargo编译

cd /opt/vgpu_unlock-rs && git checkout v2.0.1 && cargo build --release

编译时间会很长,可能也需要网络,可以使用github action云编译

image.png

5.2 安装vgpu_unlock

cp /opt/vgpu_unlock-rs/target/release/libvgpu_unlock_rs.so /lib/nvidia/libvgpu_unlock_rs.so

重启主机。

六:验证 重启之后,使用nvidia-smi 确认是否如下,显示GPU信息。

root@pve:~# nvidia-smi Wed Apr 27 23:33:10 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 P106-090 Off | 00000000:05:00.0 Off | N/A | | 31% 35C P0 28W / 75W | 11MiB / 3071MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

使用mdevctl types 验证是否出现mdev设备

root@pve:/opt/vgpu_unlock-rs# mdevctl types 0000:05:00.0 nvidia-156 Available instances: 12 Device API: vfio-pci Name: GRID P40-2B Description: num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=12 nvidia-215 Available instances: 12 Device API: vfio-pci Name: GRID P40-2B4 Description: num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=12 nvidia-241 Available instances: 24 Device API: vfio-pci Name: GRID P40-1B4 Description: num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=24 nvidia-283 Available instances: 6 Device API: vfio-pci Name: GRID P40-4C Description: num_heads=1, frl_config=60, framebuffer=4096M, max_resolution=4096x2160, max_instance=6 nvidia-284 Available instances: 4 Device API: vfio-pci Name: GRID P40-6C Description: num_heads=1, frl_config=60, framebuffer=6144M, max_resolution=4096x2160, max_instance=4 nvidia-285 Available instances: 3 Device API: vfio-pci Name: GRID P40-8C Description: num_heads=1, frl_config=60, framebuffer=8192M, max_resolution=4096x2160, max_instance=3 nvidia-286 Available instances: 2 Device API: vfio-pci Name: GRID P40-12C Description: num_heads=1, frl_config=60, framebuffer=12288M, max_resolution=4096x2160, max_instance=2

七:开始使用 7.1 配置vgpu参数(可选配置。使用原生vgpu,可以忽略)

#创建配置文件夹 mkdir /etc/vgpu_unlock #创建vgpu配置文件 touch /etc/vgpu_unlock/profile_override.toml

将vgpu配置信息写入/etc/vgpu_unlock/profile_override.toml每次启动一个vgpu设备,vgpu-mgr服务会自动读取此文件,所以修改此文件,是下次启动生效。

[profile.nvidia-18] num_displays = 1 display_width = 1920 display_height = 1080 max_pixels = 2073600 cuda_enabled = 1 frl_enabled = 0 framebuffer = 12348030976 pci_id = 0x17F011A9 pci_device_id = 0x17F0

参数说明:

[profile.nvidia-18]这是针对nvidia-18 vgpu型号的配置。若需要配置的vgpu型号为nvidia-46,则需要改成nvidia-16。见7.2

num_displays 最大显示器数量

display_width = 1920

display_height = 1080

max_pixels = 2073600 这3个是虚拟显示器的分辨率,max_pixels是长宽的乘积

cuda_enabled = 1是否开启cuda

frl_enabled = 0 是否限制帧数,0为不限制,如限制60 144 244

framebuffer = 显存,请查看下面的补充

pci_id = SDID SVID的组合

pci_device_id = DID 设备id

7.1 framebuffer

framebuffer意思是vgpu管理程序设定的vgpu显存。

通过这个网址换算在线文件大小(bit,bytes,KB,MB,GB,TB)转换换算-BeJSON.com

注意:vgpu会默认占用128M,所以如果要改显存,请将结果减去128M再去换算

例如,你期望显存为2048M,所以就用2048-128=1920

进入上面的网址,进行换算。bytes是我们要的结果

image.png

换算结果为2013265920

注意!非必要情况,请勿修改显卡,否则无法初始化mdev设备。

7.2 pci_id和pci_device_id 在正常情况下,将vgpu设备直通给VM,会带有vgpu的设备id,这样在系统内,会识别这个vgpu为p40-1a或者rt6000-1a之类的型号。随后安装nvidia-vgpu驱动,会将vgpu设备作为一个vgpu设备来使用,如进行授权管理。

正因为vgpu卡和普通的消费卡,核心相同,只是驱动不同,导致了功能有所不一样,所以有了vgpu_unlock项目,让消费卡也能支持vgpu。

这是宿主机层面的。

在虚拟机层面来讲。vgpu的核心,其实和显卡的核心一样,那么从理论上,将vgpu的设备id改成消费卡的id,那么也应该能够驱动。

然而,由于消费卡某些专业功能不能使用,所以建议将vgpu的设备id改成专业卡的id。

配置文件中的pci_id = 0x17F011A0和pci_device_id = 0x17F0就是修改vgpu的设备信息。这些参数,vgpu管理程序会读取这些信息,重写vgpu配置,更加的稳定和真实。

pci_device_id:是vgpu所属的设备id 这项属性应该从此处获得:https://devicehunt.com/view/type/pci/vendor/10DE/

正因我们的目的,是改写vgpu信息,使其在虚拟机内,能被识别为专业卡,从而绕过vgpu的驱动限制,无需授权。

所以,我们应该根据你的物理卡的核心来配置这个设备id。

例如,你有一张1080来使用vgpu,从上面的网站,我们可以看到1080的核心代号为,

image.png

那么你应该选择核心为GP104GL的卡。如下,所以你应该选择P5000或者P4000。

image.png

所以如果你要用1080,那么你的pci_device_id = 0x1BB0

pci_id: SDID的SVID的组合 pci_id 和pci_device_id用下面一张图就可以看得懂

image.png

SDID是二级制造商设备识别码,可以和DID一样

SVID是二级制造商识别码。可以和VID一样

那么如果你不知道这些信息,你可以直接写pci_id = 0x1BB010DE

7.2 vgpu类型 当我们使用mdevctl types 会出现很多信息。其中就包括了vgpu的型号

root@pve2:/opt/vgpu_unlock# mdevctl types 0000:01:00.0 nvidia-156 Available instances: 0 Device API: vfio-pci Name: GRID P40-2B Description: num_heads=4, frl_config=45, framebuffer=2048M , max_resolution=5120x2880, max_instance=12 nvidia-215 Available instances: 0 Device API: vfio-pci Name: GRID P40-2B4 Description: num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=12

这些是什么意思呢?

举个例子

nvidia-257

Available instances: 4 Device API: vfio-pci Name: GRID RTX6000-2Q Description: num=heads=4, frl_config=60, framebuffer=2048MB,max_resolution=7680x4320, max_instance=4

nvidia-257 -->vgpu 类型 Available instances --->可用的设备数 Name--->显示名 Description--->描述,framebuffer 显存,frl 应该是最大 fps,分辨率,最多的设备

其中 GRID RTX6000-2Q 是 mdev 的名字,RTX6000--显卡名,2--2G 显存,Q 代表 vWS

关于最后一位字母,如下

A = Virtual Applications (vApps)

B = Virtual Desktops (vPC)

C = AI/Machine Learning/Training (vCS or vWS)

Q = Virtual Workstations (vWS)(性能最好)

每种不同类型的GPU卡,都会存在不同的vgpu类型。例如P4,有P4-1B,例如RTX6000-1B之类的

总体不变的是上面所说的规则:

按照显存分,如P4-1B,P4-1Q,都属于1g显存。

按照功能分,如P4-1B,vPC设备,P4-1Q,vDWS设备。需要不同的许可证。

在虚拟化层面,我们只关心vgpu的型号,也就是nvidia-257

在配置vgpu的时候,我们就需要选择正确的型号。

image.png

如上图所示,我们需要通过mdevctl types的输出,找到我们需要的vgpu型号,通过profile_override.toml配置参数,再去web界面配置vgpu,才能完成vgpu部署。

7.4 修改虚拟机配置(必须操作) 添加下面行到虚拟机conf中

args: -uuid 00000000-0000-0000-0000-000000000100

注意的是,uuid最后的值需要改成你的vmid。如果你的vmid为3333,那么你应该改成

args: -uuid 00000000-0000-0000-0000-000000003333

如果你的vmid是121,那么你应该改成

args: -uuid 00000000-0000-0000-0000-000000000121

注意,uuid的长度和格式是不能变的,根据自己的vmid,替换尾数。

7.5 创建虚拟机 使用vgpu建议使用Windows 21h1以上的系统。

7.5.1 创建虚拟机并安装系统 创建一个虚拟机,seabios和ovmf都可以,芯片组必须是Q35!除非你Q35确实不能用,则换成i440fx。此时不要直通显示设备。

参考配置如下

image.png

注意! vgpu在系统中,是作为一个3d设备,所以需要一个额外的显示卡,也就是不要在控制台中,把显卡设置成无!

创建好系统之后,请在系统中,开启远程功能。如远程桌面,todesk,vnc,向日葵,parsec等。

这是因为Win10此类系统,会联网自动安装驱动,如果直通了vgpu,且系统安装了驱动,系统会呈现双显示器状态,可能导致PVE网页虚拟机控制台黑屏,或者是副屏状态,导致无法操作虚拟机。如下面

image.png

如果你不慎掉入这个的坑,请关闭虚拟机,分离vgpu,开启远程功能。

7.5.2 直通vgpu设备 在面板,点击添加PCI设备,勾选所有功能和PCIE。在Mdev类型中选择vgpu设备。选哪种,请参考上文。

image.png

最终虚拟机配置,像这样:

image.png

现在你可以开启虚拟机。如果是严格按照上面教程操作,那么应该不会有意外发生。

如果你看到有下面提示:

kvm: -device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:05:00.0/00000000-0000-0000-0000-000000003561,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0: warning: vfio 00000000-0000-0000-0000-000000003561: Could not enable error recovery for the device TASK OK

不要在意,这只是个提示,最终结果都是TASK OK。

7.5.3 安装显卡驱动 我已经将比较兼容的驱动,放置在网盘中

https://foxi.buduanwang.vip/pan/foxi/Virtualization/vGPU/guestdrivers/

image.png

请按照自己的情况下载驱动。

7.6 访问虚拟机 正常安装好驱动,那么不出意外,你的设备管理器,会看到模拟成专业卡的vgpu设备

image.png

屏幕也会有双屏

image.png

由于vgpu属于虚拟的,无法输出到物理显示器,所以应该通过远程协议访问。推荐使用parsec进行串流,但是parsec依靠NVENC,如果你的显卡没有NVENC,则不能用parsec,例如P106。

对于双屏,建议设置仅为vgpu屏幕。下面是通过vnc,进行鲁大师跑分的截图。

image.png

九:排错 对于排错这部分,需要你掌握KVM知识、vgpu知识以及Linux基础。

如最开始所说,vgpu有2个服务。

可以通过2个命令查看nvidia-vgpu日志

journalctl -u nvidia-vgpud journalctl -u nvidia-vgpu-mgr

如vgpu初始化部分

Apr 28 00:15:58 pve nvidia-vgpu-mgr[2534]: notice: vmiop_env_log: nvidia-vgpu-mgr daemon started #创建vgpu设备 Apr 28 00:20:17 pve nvidia-vgpu-mgr[2534]: VgpuStart { uuid: {00000000-0000-0000-0000-000000003561}, config_params: "vgpu_type_id=46", unknown_410: [75, 13, 0, 0, 0, 5, 0, 0, 1, 0, 0, 0, 0, 5, 0, 0], } #默认的vgpu配置 Apr 28 00:20:17 pve nvidia-vgpu-mgr[3528]: notice: vmiop_env_log: vmiop-env: guest_max_gpfn:0x0 ...skipping... num_displays: 4, display_width: 5120, display_height: 2880, max_pixels: 17694720, frl_config: 60, cuda_enabled: 1, ecc_supported: 1, mig_instance_size: 0, multi_vgpu_supported: 0, pci_id: 0x1b3811e8, pci_device_id: 0x1b38, framebuffer: 0x38000000, mappable_video_size: 0x400000, framebuffer_reservation: 0x8000000, encoder_capacity: 0x64, bar1_length: 0x100, blob: [71, 82, 73, 68, 32, 80, 52, 48, 45, 49, 81, 0, 96, 1, 0, 0, 8, 80, 244, 134, 2, 179, 255, 255, 0, 0, 0, 0, 96, 1, 0> license_type: "NVIDIA RTX Virtual Workstation", } #读取/etc/vgpu_unlock/profile_override.toml,并覆写vgpu配置 Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: Applying profile nvidia-46 overrides Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: Patching nvidia-46/num_displays: 4 -> 1 Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: Patching nvidia-46/display_width: 5120 -> 1920 Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: Patching nvidia-46/display_height: 2880 -> 1080 Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: Patching nvidia-46/max_pixels: 17694720 -> 2073600 Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: Patching nvidia-46/cuda_enabled: 1 -> 1 Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: Patching nvidia-46/pci_id: 456659432 -> 472977896 Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: Patching nvidia-46/pci_device_id: 6968 -> 7217 Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: Patching nvidia-46/frl_enabled: 1 -> 0 Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: op_type: 0xa0810115 failed. Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: (0x0): Setting mappable_cpu_host_aperture to 10M Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: (0x0): gpu-pci-id : 0x500 Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: (0x0): vgpu_type : Quadro Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: (0x0): Framebuffer: 0x38000000 Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: (0x0): Virtual Device Id: 0x1c31:0x11e8 Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: ######## vGPU Manager Information: ######## Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: Driver Version: 460.73.01 Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: op_type: 0x2080012f failed. #在VM中获取vgpu信息 Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: (0x0): Cannot query ECC status. vGPU ECC support will be disabled. Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: (0x0): Init frame copy engine: syncing... Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: (0x0): vGPU migration disabled Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: display_init inst: 0 successful Apr 28 12:42:49 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: ######## Guest NVIDIA Driver Information: ######## Apr 28 12:42:49 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: Driver Version: 453.10 Apr 28 12:42:49 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: vGPU version: 0x70001 Apr 28 12:42:49 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: (0x0): Current max guest pfn = 0x17cd58! lines 477-521/521 (END)


【本文地址】

公司简介

联系我们

今日新闻

    推荐新闻

    专题文章
      CopyRight 2018-2019 实验室设备网 版权所有